Rare events and long-term risks

The Github repo of the first part of the course contains the html and Quarto notebooks for the code. All the academic references (paper) and many others can be accessed in the following zip file.

NOTE: The present notebook is coded in R. It relies heavily on the tidyverse ecosystem of packages. We load the tidyverse below as a prerequisite for the rest of the notebook - along with a few other libraries.

\(\rightarrow\) Don’t forget that code flows sequentially. A random chunk may not work if the previous have have not been executed.

library(tidyverse)   # Package for data wrangling
library(readxl)      # Package to import MS Excel files
library(latex2exp)   # Package for LaTeX expressions
library(quantmod)    # Package for stock data extraction
library(highcharter) # Package for reactive plots
library(roll)        # Package for rolling simple operations
library(broom)       # Package for grouped regressions

1 Context

1.1 Market turmoil

Rare events are sometimes called Black Swans in financial economics. The most visible cases are stock market crashes. They come in several flavours:

Let’s have a look. Below, we extract the time-series of the S&P500, one of the most important indices in US equity markets. In fact, we work with the corresponding Exchange-Traded Fund, SPY, downloaded with the {quantmod} package.

min_date <- "1990-01-01"
max_date <- "2024-01-05"
prices <- getSymbols("SPY", src = 'yahoo',  # Yahoo source (ETF ticker) ^GSPC
                     from = min_date, 
                     to = max_date,
                     auto.assign = TRUE, 
                     warnings = FALSE) %>%
  map(~Ad(get(.))) %>% 
  reduce(merge) %>%
  `colnames<-`("SPY")
highchart(type = "stock") %>%
  hc_title(text = "Evolution of the SPY") %>%
  hc_add_series(prices) %>%
  hc_tooltip(pointFormat = '{point.y:.3f}') # To round the y-axis numbers

Next, we depict the distribution of returns:

\[r_t=\frac{p_t-p_{t-1}}{p_{t-1}}\]

Note that, for simplicity, it is customary to assume that returns follow a Gaussian distribution.

returns_d <- prices %>%                           # Formula for returns
  data.frame(Date = index(.)) %>%                 # Adding the Date as a column
  mutate(SPY = SPY / dplyr::lag(SPY) - 1) %>%     # Returns
  na.omit()                                       # Removing NA rows
m <- mean(returns_d$SPY)                          # Average daily return
s <- sd(returns_d$SPY)                            # Volatility = sd of daily returns

returns_d %>% ggplot() +                          # Plot
  geom_histogram(aes(x = SPY, y = ..density..), bins = 100, alpha = 0.6) + theme_light() +
  stat_function(fun = dnorm, args = list(mean = m, sd = s), aes(color = "Gaussian")) +
  theme(legend.position = c(0.7, 0.7))

And if we zoom on the left tail (very bad days!), we see the problem of Gaussian models: large negative returns basically have a zero probability of occurrence.

dens2 <- function(x){
  dnorm(x, mean = m, sd = s)/mean(returns_d<(-0.02) & returns_d>(-0.12), na.rm=T) # Need to rescale
}
returns_d %>% ggplot() +                         # Plot
  geom_histogram(aes(x = SPY, y = ..density..), bins = 100, alpha = 0.6) + theme_light() +
  stat_function(fun = dens2, aes(color = "Gaussian")) + 
  theme(legend.position = c(0.4, 0.4)) + 
  xlim(-0.12,-0.02)

And it is precisely because people use Gaussian models that they underestimate the likelihood of extreme risks.
For instance, if we consider daily returns, with a mean of zero (this does not change the results) and a volatility (standard deviation) of 15% annually, which means 1% daily (we divide by the square-root of 252, the number of trading days in one year in the US). Then, a return of -4% or worse has a probability of 3.2^{-5}, which means once every 125 years. But in practice, these levels occur almost every year… This is why heavy-tailed distributions, such as the Student distribution (see to the right), the stable distribution, or even the generalized hyperbolic distribution are much better choices.

dens2 <- function(x){
  dnorm(x, mean = m, sd = s)/mean(returns_d<(-0.02) & returns_d>(-0.12), na.rm=T)  # Need to rescale
}
dens_t <- function(x, m, s, df){dt((x-m)/s, df)/s}
dens3 <- function(x){
  dens_t(x, m, s, 5)/mean(returns_d<(-0.02) & returns_d>(-0.12), na.rm=T) # Need to rescale
}

returns_d %>% ggplot() +                         # Plot
  geom_histogram(aes(x = SPY, y = ..density..), bins = 100, alpha = 0.6) + theme_light() +
  stat_function(fun = dens2, aes(color = "Gaussian")) + 
  stat_function(fun = dens3, aes(color = "Student (df = 5)")) + 
  theme(legend.position = c(0.4, 0.4)) + 
  xlim(-0.12,-0.02)

The reasons underpinning sharp negative returns are diverse:
- overvaluation (tech bubble ~2001);
- systemic problem (subprimes ~2008);
- unforseen pandemic (Covid in 2020);
- wars & major geopolitical shocks (Ukraine in 2022).

None of these causes can (or have) been predicted by standard economic models or statistical tools.

1.2 Climate risks

Until now, we have mostly focused on the frequency of adverse events. Gaussian models underestimate this frequency, whereas the data tells us that significant risks occur more often.

Another angle to extreme risks is severity. Indeed, the total loss over a given horizon is going to be the average, i.e., the loss times its probability of occurrence. Hence, the magnitude of extreme events matters too of course.

This is where climate risks kick in. Heatwaves, droughts, floods, hurricanes, etc. All these weather-induced catastrophes have an impact on the economic activity. In the past, they were relatively rare indeed, but as climate change intensifies, they may become not only more frequent, but also more devastating.

We provide below two illustrations (see references below). First, a visual summary of disasters across the globe only in 2021.

And next, the quantification of damage in terms of human lives and infrastructure costs.

2 Consumption disaster risk models

2.1 Continuous time

Here, we follow Tsai & Wachter 2015.

In financial economics, every day uncertainty is modelled (in continuous time) with Brownian motions, often written \(B_t\) - here in one dimension.
A rare event will require a jump process and the most straightforward way to proceed is with Poisson process, \(N_t\). This process has independent increments but depends on a parameter, the intensity, \(\lambda\) or \(\lambda_t\) if it is time-dependent. When \(\lambda\) is fixed, \(\mathbb{E}[N_t]=\lambda t\) and \(\mathbb{P}[N_t=n]=\frac{(\lambda t)^n}{n!}e^{-\lambda t}\).

First, we consider a state variable, which is often taken to be aggregate consumption (again!). It is assumed to be driven by the following stochastic differential equation (SDE):

\[\frac{dC_t}{C_{t-}}=\mu dt + \sigma dB_t + (e^Z-1)dN_t, \tag{1}\]

where:

  • \(\mu\) is the drift, i.e., the deterministic rate of increase;
  • \(\sigma\) is the every day volatility;
  • \((e^Z-1)dN_t\) models the extreme events. The random variable \(Z<0\) drives the distribution of the loss and \(N_t\) its occurence and frequency;
  • the notation \(t-\) simply means before the occurrence of a potential jump at time \(t\).

Next, Tsai & Wachter 2015 make a strong assumption with \(D_t=C_t^\phi\) with \(\phi >0\), i.e., dividends are a power function of consumption. Using Ito’s lemma for jump-diffusion processes1, we derive the dynamics of dividends:

\[\frac{dD_t}{D_{t-}}=\left(\phi \mu + \frac{1}{2}\phi(\phi-1)\sigma^2 \right)dt + \phi \sigma dB_t + (e^{\phi Z}-1)dN_t, \tag{2}\]

Lastly, \(S_t\) is the price of the dividend claim and it is assumed (for simplicity again) that \(S_t/D_t=c\) is constant. Hence

\[\frac{dS_t}{S_{t-}}=\mu_S dt + \phi \sigma dB_t + (e^{\phi Z}-1)dN_t, \tag{3}\]

where the drift \(\mu_S=\mu_D\).

Equation 3 defines the infinitesimal return of the asset. If we add the deterministic continuously-paid dividend yield, we have the average instantaneous average return:

\[r_S=\mathbb{E}\left[ \frac{dS_t+D_t}{S_{t-}}\right]=\mu_S +\lambda \mathbb{E}[e^{Z}-1]+c^{-1}\]

It can then be shown (see the Appendix of Tsai & Wachter 2015 for the complete proof) that the equity premium (over the risk-free rate \(r\)):

\[r_S-r=\gamma \phi \sigma^2-\lambda \mathbb{E}[(e^{-\gamma Z}-1)(e^{\phi Z}-1)] \tag{4}\]

where \(\gamma\) is CRRA risk aversion (utility on consumption). If there is no disaster risk, the premium is \[r_S-r=\gamma \phi \sigma^2-\lambda \mathbb{E}[e^{-\gamma Z}(e^{\phi Z}-1)],\] which is higher because \(Z<0\).

Now, let us turn to actual numbers, i.e., rough estimates of the values in Equation 4. In the US, the (annualized) equity premium oscillates between 4% and 5% (left hand side - lhs - of Equation 4). For simplicity, most papers fix \(\phi = 1\). For \(\sigma\), Tsai & Wachter 2015 mention a value of 2%, but on recent data (1980-2022), the standard deviation is around 1% (this figure is relatively stable post-1950). If we omit the term from extreme risks (jumps), this would require an implausible level of risk aversion \(4%/(1%)^2\).

Luckily, there is the last term. Tsai & Wachter 2015 make the following reasoning. During the 1929 Great Depression, consumption fell by 25%. Since it is a once in a century event (probably of occurrence of 1%)), this means that the expectation term in the premium is 0.25%. This is in line with the Covid crisis (also a 25% drop in consumption). A drop of 25% implies \(Z=-0.288\). With \(\gamma = 10\) and \(\lambda = 0.01\), we get that the last term is \[-0.01\times (e^{10\times0.288}-1)\times(e^{-0.288}-1)=4.2\%.\]

There are of course debates around this parametrization, but it solves the C-CAPM problem - thanks to extreme risks.

2.2 Discrete time

An anterior and similar approach is found in Rare Disasters, Asset Prices, and Welfare Costs and On the Size Distribution of Macroeconomic Disasters. Therein, the state variable is both consumption and production (they are conveniently equal), denoted with \(Y_t\) and such that \[\log(Y_{t+1})=\log(Y_{t})+g+u_{t+1}+v_{t+1},\]

where \(g\) is some exogenous productivity growth rate and:

  • \(u_{t+1}\) is zero mean Gaussian iid (\(\sigma^2\) variance) and it reflects randomness in usual growth conditions;
  • \(v_{t+1}\) pertains to disasters; it is equal to zero with probability \(1-p\) and to \(\log(1-b)\) with probability \(p\) (disaster with random magnitude \(b \in (0,1]\))

All in all, the average growth rate is \(g^*=g+\sigma^2/2-p \mathbb{E}[b]\). And in this case, if the utility is proportional to \(Y^{1-\gamma}\), the equity premium is

\[r_S-r = \gamma \sigma^2+p \mathbb{E}[b((1-b)^{-\gamma}-1)] \tag{5}\]

See Rare Disasters, Asset Prices, and Welfare Costs for the full proof.

In On the Size Distribution of Macroeconomic Disasters, the authors carry an estimation exercise of the random (disaster) variable \(b\).

Let us have a look at some data. The main issue here is that disasters in economic data (consumption and GDP growth) are actually rare. For instance, in the US, strong contactions (more severe than -10%) occur at most every decade. Hence, to build a distribution, one country will not be enough: we need more.

library(WDI)                                          # Package that accesses World Bank data
wb_data <- WDI(                                       # World Bank data
  indicator = c("gdp" = "NY.GDP.MKTP.CD",
                "stock" = "CM.MKT.INDX.ZG",
                "cap" = "CM.MKT.LCAP.CD"),            
  start = 1960, 
  end = 2022) 
gdp_data <- wb_data |>
  filter(is.finite(gdp)) |>
  group_by(country) |>
  mutate(b = 1-gdp/dplyr::lag(gdp),
         gdp_growth = -b) |>
  ungroup() |>
  filter(gdp_growth < 5)

Defining disasters is not obvious. Peak-to-trough (maximum drawdowns) are one option. Above, we looked at annual growths and selected at most one observation per calendar decade that was below a -10% contraction. Because we have a large panel of countries, this generates enough points.

disasters <- gdp_data |>
  filter(gdp_growth < -0.1) |>
  mutate(yea = substr(year, 1,3)) |>                    # Only one per calendar decade
  na.omit()
disasters <- disasters[!duplicated(disasters |> dplyr::select(country,yea)),] 

temp <- tempfile()
link <- "https://raw.githubusercontent.com/shokru/carbon_emissions/main/country_codes.csv"
download.file(link, temp, quiet = TRUE)
codes <- read_csv(temp)

disasters <- disasters |>
  filter(iso3c %in% codes$`alpha-3`) |>
  mutate(transformed = 1/(1-b))
disasters |> head()
country iso2c iso3c year gdp stock cap b gdp_growth yea transformed
Argentina AR ARG 2002 9.772400e+10 -51.4000015 1.657100e+10 0.6363037 -0.6363037 200 2.749547
Argentina AR ARG 2018 5.248199e+11 -40.6088505 4.598605e+10 0.1845918 -0.1845918 201 1.226380
Australia AU AUS 2009 9.287621e+11 72.3744822 1.261909e+12 0.1205840 -0.1205840 200 1.137118
Australia AU AUS 2016 1.207581e+12 7.0070584 1.268494e+12 0.1066662 -0.1066662 201 1.119402
Austria AT AUT 1997 2.127903e+11 -1.6709353 3.682006e+10 0.1031001 -0.1031001 199 1.114952
Austria AT AUT 2015 3.819711e+11 0.8198199 9.607938e+10 0.1369538 -0.1369538 201 1.158686
power_fun  <- function(x, a) {
  x^(-a)*(a-1)
}
disasters |>
  filter(gdp_growth < -0.1, transformed > 1.05, transformed < 4) |>
  ggplot(aes(x = transformed)) + geom_histogram(aes(y = ..density..), fill = "#2299FF", alpha = 0.6) +
  theme_classic() +
  geom_function(fun = power_fun, args = list(a = 6.86+1))

Let’s compare with the original plot from the paper (same power function depicted). It’s quite close, though we have virtually zero points for transformed values above 2. We consider the left bound/threshold to be \(z=1\).

Choosing a power law that starts at \(z=1\) allows to obtain a very simple expression for Equation 5:

\[r_S-r = \gamma \sigma^2+p \left(\frac{\alpha}{\alpha-\gamma}-\frac{\alpha}{\alpha+1-\gamma}+\frac{\alpha}{\alpha+1}-1 \right)\]

Equipped, with this formula, we can try to figure out how to reach a 4% premium - shown with the horizontal grey dashed line in the graph on the right.
Below, we assume a frequency of disaster of \(p=0.03\) and a risk aversion of \(\gamma \in \{4,5\}\) and a volatility of GDP growth of 2%.
Plainly, the level of risk aversion has to be linked with the power index.

premium <- function(a, gamma, s, p){
  gamma*s^2+p*(a/(a-gamma)-a/(a+1-gamma)+a/(a+1)-1)
}
data.frame(a=c(6,8)) |>
  ggplot(aes(x = a)) + theme_classic() + 
  geom_function(fun = premium, args = list(gamma = 5, s = 0.02, p = 0.03), aes(color = "gamma = 5")) +
  geom_function(fun = premium, args = list(gamma = 4, s = 0.02, p = 0.03), aes(color = "gamma = 4")) +
  geom_hline(yintercept = 0.04, linetype = 2, color = "#999999") + xlab("power index") +
  theme(axis.title.y = element_blank(),
        legend.title = element_blank(),
        legend.position = c(0.7,0.7)) 

3 Climate disaster

What follows is taken from the paper Climate change risk by Bansal, Kiku and Ochoa.

3.1 The economy

The economy is modelled such that consumption growth is \[\Delta c_{t+1}=\log(C_{t+1}/C_t)=\mu+\sigma_\eta \eta_{t+1}+D_{t+1}, \tag{6}\]

where \(\eta_{t+1}\) is consumption growth shock and \(D_{t+1}= N_{t+1}d\) climate damage, modelled as the product between a Poisson process that counts the number (occurrences) of climate events and \(d<0\) the corresponding decline in consumption growth. The intensity of the Poisson process is increasing with the temperature anomaly \(T_t\): \[\pi_t = l_0+l_1T_t, \quad l_0,l_1>0.\]

In order to loop the loop, temperature is tied with economic activity: \(T_{t+1}=\chi E_{t+1}\), with \(E_{t+1}\) being the total carbon emissions following the autoregressive dynamics: \[E_{t+1}=\nu E_{t}+\iota (\mu +\sigma_n\eta_{t+1})+ \sigma_\zeta \zeta_{t+1},\] where \(\chi>0\) is the sensitivity of temperature to emissions, \(\iota>0\) is the carbon intensity of consumption and \(\nu \in (0,1)\) is a persistence parameter. \(\zeta_t\) serves as exogenous innovation.

3.2 The representative agent

The utility is modelled with Epstein-Zin preferences:

\[U_t=\left((1-\delta)C_t^{1-\psi^{-1}}+\delta \mathbb{E}_t\left[U_{t+1}^{1-\gamma}\right]^{(1-\psi^{-1})/(1-\gamma)} \right)^{1/(1-\psi^{-1})}\] In this case, the stochastic discount factor is \[m_{t+1}=\theta \log(\delta))-\frac{\theta}{\psi}\Delta c_{t+1}+(\theta-1)r_{c,t+1}, \quad \theta = \frac{1-\gamma}{1-\psi^{-1}}\]

where \(r_{c,t+1}\) is the return on wealth.

Now, we also have individual stocks, indexed by \(i\) with dividend growth subject to \[\Delta d_{i,t+1}=\phi_i\Delta_{c,t+1}+\sigma u_{i,t+1}\] where \(\Delta c\) is given by Equation 6 and \(u_{i,t}\) is a Gaussian white noise. It can then be shown (though it is long and we refer to the article for details) that the excess conditional return of asset \(i\) is

\[\mathbb{E}_t[\log(r_{i,t+1})]-r_f=\beta_{i,\eta} \lambda_\eta+\sigma^2_\eta+\beta_{i,\zeta}\lambda_\zeta\sigma_\zeta^2+\beta_{i,D}\lambda_D(l_0+l_1T_t)\]

Recall that \(\sigma_\zeta^2\) is the volatility of emissions and \(\sigma^2_\eta\) is the variability of consumption growth that is unrelated to climate risk.

In the paper, the authors empirically test the predictions of the model by running the following regressions:

\[r_{i,t,k}=a_i+\beta_{i,T}\Delta_k T_t+\beta_{i,m}r_{m,t,k}+\beta_{i,c}\Delta_kc_t+e_{i,t,k},\]

where the subscript refers to \(k\) periods. \(r_{m,t,k}\) is the excess return on the aggregate stock market and serves as major control variable.

3.3 Data

First, we set the horizon.

k <- 3

The gathering of all variables requires several sources.
We start with a source that lists and US states.

states <- read_csv("https://raw.githubusercontent.com/jasonong/List-of-US-States/master/states.csv")
states <- states |> filter(State != "District of Columbia") |> pull(State)

We then fetch the data for mainland temperature anomalies and format the result (we exclude Alaska and Hawaii). It comes from the National Centers for Environmental Information (NCEI) of the National Oceanic and Atmospheric Administration (NOAA).

data_temp <- c()
for(j in 1:48){
  data_temp <- data_temp |> bind_rows(
    read_csv(paste0("https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/statewide/time-series/",
                    j,
                    "/tavg/all/4/1895-2023.csv?base_prd=true&begbaseyear=1895&endbaseyear=1945"), 
             skip = 4) |>
      mutate(state = states[j] |> as.character())
  )
}
data_temp <- data_temp |>
  rename(date = Date) |>
  mutate(date = make_date(year = substr(date, 1, 4) |> as.numeric(),
                          month = substr(date, 5, 6) |> as.numeric(),
                          day = 28)) |>
  group_by(date) |>
  mutate(anom = mean(Anomaly)) |>
  ungroup() |>
  filter(date > "1945-01-01")
data_temp |> head()
date Value Anomaly state anom
1945-01-28 44.8 -1.0 Alabama -1.266667
1945-02-28 51.4 3.7 Alabama 2.447917
1945-03-28 63.4 7.8 Alabama 6.466667
1945-04-28 65.4 2.5 Alabama 1.079167
1945-05-28 68.4 -2.7 Alabama -2.462500
1945-06-28 78.2 -0.1 Alabama -1.912500

We then keep only one aggregated series.

data_temp <- data_temp |>
  filter(state == "Alabama") |>
  select(date, anom) |>
  mutate(ym = paste0(year(date), month(date)),
         d_anom = anom/lag(anom, k) - 1)

For the aggregate stock market returns, there are several possible choices, like Ken French’s data library. Below, we recycle the data from the previous exercise.

# y <- getSymbols("SPY", src = 'yahoo',         # Data from Yahoo Finance
#                      from = min_date,         # Start date
#                      to = max_date,           # End date
#                      auto.assign = TRUE, 
#                      warnings = FALSE) %>% 
#   map(~Ad(get(.))) %>%                        # Retrieving the data
#   reduce(merge)

spy <- SPY %>%                                        # Start from prices
  to.monthly(indexAt = "lastof", OHLC = FALSE) %>%    # Convert to monthly 
  data.frame(Date = index(.)) %>%                     # Convert the index to a date
  remove_rownames() |>
  select(Date, SPY.Close) |>
  rename(date = 1, spy = 2) |>
  mutate(ym = paste0(year(date), month(date)),
         spy = spy/lag(spy, k) - 1) |>
  na.omit()
spy |> head()
date spy ym
4 1993-04-30 0.0021337 19934
5 1993-05-31 0.0182970 19935
6 1993-06-30 -0.0027663 19936
7 1993-07-31 0.0184528 19937
8 1993-08-31 0.0297167 19938
9 1993-09-30 0.0194175 19939

Next, consumption data (from previous session).

temp <- tempfile()
link <- "https://fred.stlouisfed.org/graph/fredgraph.xls?id=PCE"
download.file(link, temp, quiet = TRUE)
cons_data <- read_excel(temp, skip = 10) |>
  mutate(observation_date = observation_date - 1,   # Last day of the month
         observation_date = as.Date(observation_date),
         D_pce = PCE/lag(PCE, k)-1) |>  
  rename(date = observation_date) |> 
  mutate(ym = paste0(year(date), month(date))) |>
  filter(date > min_date) 

Finally, let’s find some individual stock data to run the regressions.

tickers <- c("AAPL", "BA", "C", "CAT", "CVX", "DIS", "F", "GE", "MCD", "MMC",  
             "MSFT", "PFE", "WMT", "XOM")  
stocks <- getSymbols(tickers, src = 'yahoo',  # Data from Yahoo Finance
                     from = min_date,         # Start date
                     to = max_date,           # End date
                     auto.assign = TRUE, 
                     warnings = FALSE) %>% 
  map(~Ad(get(.))) %>%                        # Retrieving the data
  reduce(merge) %>%                           # Merge in one dataframe
  `colnames<-`(tickers)                       # Set the column names
stocks |> head()                              # Have a look at the result!
                AAPL       BA        C      CAT      CVX      DIS        F
1990-01-02 0.2634136 11.04362 11.79453 3.260036 4.892342 6.945646 2.431400
1990-01-03 0.2651813 11.31298 11.94510 3.294644 4.812721 6.960682 2.444832
1990-01-04 0.2660660 11.26809 11.79453 3.308488 4.750794 6.953166 2.444832
1990-01-05 0.2669498 11.11097 11.89491 3.287723 4.680018 6.975715 2.411250
1990-01-08 0.2687176 11.24564 11.99528 3.266959 4.724252 7.035610 2.411250
1990-01-09 0.2660660 11.11097 11.84472 3.246193 4.688868 7.028086 2.370950
                 GE      MCD      MMC      MSFT       PFE      WMT      XOM
1990-01-02 14.37143 4.494273 5.508575 0.3820953 1.0072415 3.626852 3.998406
1990-01-03 14.34451 4.445948 5.473264 0.3842487 1.0107880 3.626852 3.958420
1990-01-04 14.26377 4.365406 5.420294 0.3955498 1.0267487 3.607612 3.918437
1990-01-05 14.12920 4.284864 5.303892 0.3858630 1.0178819 3.569130 3.898445
1990-01-08 14.20996 4.365406 5.268531 0.3917825 1.0090151 3.617231 3.958420
1990-01-09 13.91391 4.333187 5.374612 0.3907061 0.9877356 3.521028 3.878454
rm(list = tickers)                            # Remove unnecessary variables from the env.

returns_m <- stocks %>%                               # Start from prices
  to.monthly(indexAt = "lastof", OHLC = FALSE) %>%    # Convert to monthly 
  data.frame(Date = index(.)) %>%                     # Convert the index to a date
  remove_rownames() %>%                               # Remove index => converted into row names
  pivot_longer(-Date, names_to = "Asset", 
               values_to = "Price") %>%               # Put the data in 'tidy' format
  group_by(Asset) %>%                                 # Group the rows by asset to compute returns
  mutate(return = Price / dplyr::lag(Price, k) - 1,   # Compute the returns
         return = return |> round(4)) %>%             # To ease presentation
  select(-Price) %>%                                  # Remove price column
  na.omit()                                           # Discard rows with missing/NA data

Lastly, we aggregate all the variables together.

data <- returns_m |> 
  mutate(ym = paste0(year(Date), month(Date))) |>
  left_join(cons_data, by = "ym") |>
  left_join(spy, by = "ym") |>
  left_join(data_temp, by = "ym") |>
  select(Date, Asset, return, D_pce, spy, d_anom)
data |> head()
Date Asset return D_pce spy d_anom
1990-04-30 AAPL 0.1618 0.0127676 NA -0.8165865
1990-04-30 BA 0.1829 0.0127676 NA -0.8165865
1990-04-30 C 0.1005 0.0127676 NA -0.8165865
1990-04-30 CAT 0.1139 0.0127676 NA -0.8165865
1990-04-30 CVX -0.0046 0.0127676 NA -0.8165865
1990-04-30 DIS 0.0626 0.0127676 NA -0.8165865

3.4 Results

We can then proceed with the regressions. We group across stocks thanks to the {broom} package.

models <- data |>
  group_by(Asset) |>
  group_modify(.f = ~ broom::tidy(lm(return ~ spy + D_pce + d_anom, 
                                     data = .x)))
models |> head(9)
Asset term estimate std.error statistic p.value
AAPL (Intercept) 0.0431894 0.0129165 3.3437489 0.0009127
AAPL spy 1.2749966 0.1565434 8.1446824 0.0000000
AAPL D_pce 0.4494647 0.6608625 0.6801184 0.4968630
AAPL d_anom 0.0000962 0.0004276 0.2249546 0.8221412
BA (Intercept) -0.0008003 0.0082689 -0.0967866 0.9229493
BA spy 1.1031224 0.1002164 11.0074069 0.0000000
BA D_pce 1.2068302 0.4230726 2.8525370 0.0045855
BA d_anom 0.0001535 0.0002738 0.5607232 0.5753322
C (Intercept) -0.0177004 0.0085289 -2.0753395 0.0386589

The interesting coefficient is the one related to temperature changes.

models |> 
  filter(term == "d_anom") |>
  ggplot(aes(x = estimate, 
             y = reorder(Asset, estimate), 
             fill = if_else(p.value < 0.05, "signif.", "non signif."))) + geom_col() +
  theme_classic() + 
  theme(axis.title = element_blank(),
        legend.position = c(0.7,0.3),
        legend.title = element_blank())

In the original paper, the authors use portfolio (aggregate) returns in the lhs of the equation.
They find mostly negative coefficients, with varying degrees of significance:

The numbers are those of the last 2 columns.

3.5 Asset pricing with crises

Inspired from Time-varying rare disaster risk and stock returns, we fetch data from the International Crisis Behavior Project.

This is a purely empirical contribution, there is no theoretical model that backs the regression specifications.

In the paper, the authors test the link between global returns and crises: \[r_t=a + b c_t+e_t,\] where \(c_t\) is the number of crises during the period.

crises <- read.csv("icb1v15.csv")
crises <- crises |>
  select(crisname, yrtrig, motrig, yrterm, moterm) 
crises |> head(6)
crisname yrtrig motrig yrterm moterm
RUSSIAN CIVIL WAR I 1918 5 1920 4
COSTA RICAN COUP 1918 5 1919 9
RUSSIAN CIVIL WAR II 1918 6 1919 9
BALTIC INDEPENDENCE 1918 11 1920 8
TESCHEN 1919 1 1920 7
HUNGARIAN WAR 1919 3 1919 8
crises <- crises |> filter(yrtrig > 1980) # Keep only the recent crises

Now, we have to translate this data into a column that counts the number of crises.

counting <- function(start_end, dates){
  start <- substr(start_end, 1, 6) |> as.numeric()
  end <- substr(start_end, 8, 13) |> as.numeric()
  tibble(date = as.Date(dates)) |>
    mutate(y = year(date),
           m = month(date),
           m = if_else(nchar(m)==2, as.character(m), paste0("0",m)), # Add "0" if need be
           ym = paste0(y, m) |> as.numeric(),
           crisis = (ym >= start) * (ym <= end))
}
counting("199307_199401", spy$date) |> head(12) # The format is being_end of crisis
date y m ym crisis
1993-04-30 1993 04 199304 0
1993-05-31 1993 05 199305 0
1993-06-30 1993 06 199306 0
1993-07-31 1993 07 199307 1
1993-08-31 1993 08 199308 1
1993-09-30 1993 09 199309 1
1993-10-31 1993 10 199310 1
1993-11-30 1993 11 199311 1
1993-12-31 1993 12 199312 1
1994-01-31 1994 01 199401 1
1994-02-28 1994 02 199402 0
1994-03-31 1994 03 199403 0
counting_col <- function(start_end, dates){
  counting(start_end, dates) |> pull(crisis)
}

Okay, so we just need the correct input and to loop on all crises.
To do so, we will resort to functional programming, with the map() function from the {purrr} package.
We want to stack the output by columns, hence we use the map_dfc() alternative.

crises <- crises |>
  mutate(motrig = if_else(nchar(motrig)==2, as.character(motrig), paste0("0", motrig)), # Add zeros
         moterm = if_else(nchar(moterm)==2, as.character(moterm), paste0("0", moterm)),
         start_end = paste0(yrtrig, motrig, "_", yrterm, moterm))
head(crises, 5)
crisname yrtrig motrig yrterm moterm start_end
CHAD/LIBYA V 1981 01 1981 11 198101_198111
ECUADOR/PERU BDR. III 1981 01 1981 04 198101_198104
MOZAMBIQUE RAID 1981 01 1981 03 198101_198103
IRAQ NUCLEAR REACTOR 1981 01 1981 06 198101_198106
ESSEQUIBO II 1981 04 1983 03 198104_198303
crisis_mat <- map_dfc(crises$start_end, counting_col, dates = spy$date)  

All we ned to do is sum the rows of the matrix.

crises <- tibble(date = spy$date,
                 n = crisis_mat |> rowSums(na.rm = T)) |>
  mutate(year = year(date),
         month = month(date)) 

crises |> ggplot(aes(x = date, y = n)) + geom_line() +
  theme_light() + theme(axis.title = element_blank())

First, let’s see what happens for a local economy, the US.
We use the {broom} package for coefficient (tidy() function) and fit (glance() function) output.

data_crises <- spy |>
  mutate(year = year(date),
         month = month(date)) |>
  left_join(crises |> select(-date),     # Remove date which will be duplicate
            by = c("year", "month")) |>
  select(date, spy, n) |>
  mutate(n = if_else(is.na(n), 0, n)) |>
  filter(year(date) < 2022)

fit <- lm(spy ~ n, data = data_crises)
fit |> tidy()
term estimate std.error statistic p.value
(Intercept) 0.0391420 0.0076716 5.102208 0.0000006
n -0.0080063 0.0033364 -2.399665 0.0169429
fit |> glance()
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
0.0165111 0.0136438 0.0727781 5.758393 0.0169429 1 415.4866 -824.9733 -813.4426 1.81675 343 345

The signs are the same as in Time-varying rare disaster risk and stock returns (positive intercept, negative coefficient), but only the intercept has a negligible \(p\)-value. And the \(R^2\) is very small.

Next, let’s try for a synthetic global asset (world market). The data is sampled yearly.
And there are years with zero crisis as well (e.g., 2000).

fit <- lm(return ~ n,
          data = wb_data |>
            filter(is.finite(cap), is.finite(stock)) |>
            group_by(year) |>
            summarise(return = sum(cap*stock/100)/sum(cap)) |>
            left_join(crises |> group_by(year) |> summarise(n = max(n, na.rm = T)),
                      by = "year") |> na.omit()
) 
fit |> tidy()
term estimate std.error statistic p.value
(Intercept) 0.1597708 0.1003851 1.5915795 0.1235660
n -0.0194850 0.0295561 -0.6592564 0.5155283
fit |> glance()
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
0.0164413 -0.0213879 0.1871793 0.434619 0.5155283 1 8.226505 -10.45301 -6.456396 0.9109384 26 28

Again, a negative coefficient, but not statistical significance (smaller sample size?) and negative adjusted \(R^2\).
It would be interesting to investigate lead-lag effects. This is left as exercise.

Footnotes

  1. See Proposition 20.15 in Stochastic calculus for jump processes.↩︎